102 research outputs found

    A Framework for Aggregating Private and Public Web Archives

    Full text link
    Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential sensitive information contained in private captures. We amend Memento syntax and semantics to allow TimeMap enrichment to account for additional attributes to be expressed inclusive of the requirements for dereferencing private Web archive captures. We provide a method to involve the user further in the negotiation of archival captures in dimensions beyond time. We introduce a model for archival querying precedence and short-circuiting, as needed when aggregating private and personal Web archive captures with those from public Web archives through Memento. Negotiation of this sort is novel to Web archiving and allows for the more seamless aggregation of various types of Web archives to convey a more accurate picture of the past Web.Comment: Preprint version of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) full paper, accessible at the DO

    Aggregator Reuse and Extension for Richer Web Archive Interaction

    Full text link
    Memento aggregators enable users to query multiple web archives for captures of a URI in time through a single HTTP endpoint. While this one-to-many access point is useful for researchers and end-users, aggregators are in a position to provide additional functionality to end-users beyond black box style aggregation. This paper identifies the state-of-the-art of Memento aggregation, abstracts its processes, highlights shortcomings, and offers systematic enhancements.Comment: 16 pages, preprint accepted to be In Proceedings of the 24th International Conference on Asia-Pacific Digital Libraries (ICADL 2022

    WARCreate: Create Wayback-Consumable WARC Files From Any Webpage

    Get PDF
    The Internet Archive\u27s Wayback Machine is the most common way that typical users interact with web archives. The Internet Archive uses the Heritrix web crawler to transform pages on the publicly available web into Web ARChive (WARC) files, which can then be accessed using the Wayback Machine. Because Heritrix can only access the publicly available web, many personal pages (e.g. password-protected pages, social media pages) cannot be easily archived into the standard WARC format. We have created a Google Chrome extension, WARCreate, that allows a user to create a WARC file from any webpage. Using this tool, content that might have been otherwise lost in time can be archived in a standard format by any user. This tool provides a way for casual users to easily create archives of personal online content. This is one of the first steps in resolving issues of long term storage, maintenance, and access of personal digital assets that have emotional, intellectual, and historical value to individuals

    On Identifying Points of Semantic Shift Across Domains

    Full text link
    The semantics used for particular terms in an academic field organically evolve over time. Tracking this evolution through inspection of published literature has either been from the perspective of Linguistic scholars or has concentrated the focus of term evolution within a single domain of study. In this paper, we performed a case study to identify semantic evolution across different domains and identify examples of inter-domain semantic shifts. We initially used keywords as the basis of our search and executed an iterative process of following citations to find the initial mention of the concepts in the field. We found that a select set of keywords like ``semaphore'', ``polymorphism'', and ``ontology'' were mentioned within Computer Science literature and tracked the seminal study that borrowed those terms from original fields by citations. We marked these events as semantic evolution points. Through this manual investigation method, we can identify term evolution across different academic fields. This study reports our initial findings that will seed future automated and computational methods of incorporating concepts from additional academic fields.Comment: In 17th International Conference on Metadata and Semantics Research, October 202

    Modeling Ephraim Chambers' Knowledge Structure from a NaĂŻve Standpoint

    Get PDF
    In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical ontologies. The knowledge structure is being encoded into a Simple Knowledge Organization System (SKOS) form as well as a Web Ontology Language (OWL) version. This paper explores the expressive and functional differences between these SKOS and OWL versions of Chambers’ knowledge structure. As part of this goal, the paper research focused on the construction and application of rules in each system to produce a more computationally ready representation of Chambers’ structure in SKOS, which is more thesaurus-like, and OWL, which represents additional ontological nuances. First, studying the various textual aspects at the semantic, syntactic, and typographic levels allowed for the relationships between terms to manifest from which rules governing expression of the connections between elements developed. Second, because each language, SKOS and OWL, functionally expresses different logical relationships, their possibilities and limitations offer a ground for further analyzing the resultant knowledge structures; although, each stemmed from the same basic source of Chambers’ text. Lastly this paper will examine rule making and expression in light of Paul Grice’s theory of conversational implicature to understand how a naïve agent formulates and applies these rules to a knowledge structure

    Modeling Ephraim Chambers' Knowledge Structure from a NaĂŻve Standpoint

    Get PDF
    In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical ontologies. The knowledge structure is being encoded into a Simple Knowledge Organization System (SKOS) form as well as a Web Ontology Language (OWL) version. This paper explores the expressive and functional differences between these SKOS and OWL versions of Chambers’ knowledge structure. As part of this goal, the paper research focused on the construction and application of rules in each system to produce a more computationally ready representation of Chambers’ structure in SKOS, which is more thesaurus-like, and OWL, which represents additional ontological nuances. First, studying the various textual aspects at the semantic, syntactic, and typographic levels allowed for the relationships between terms to manifest from which rules governing expression of the connections between elements developed. Second, because each language, SKOS and OWL, functionally expresses different logical relationships, their possibilities and limitations offer a ground for further analyzing the resultant knowledge structures; although, each stemmed from the same basic source of Chambers’ text. Lastly this paper will examine rule making and expression in light of Paul Grice’s theory of conversational implicature to understand how a naïve agent formulates and applies these rules to a knowledge structure

    Impact of URI Canonicalization on Memento Count

    Get PDF
    Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count quantifying the number of non-forwarding captures for a URI-R is not possible using a TimeMap alone and that the magnitude of a TimeMap is not equivalent to the number of representations it identifies. In this work we discuss this particular phenomena in depth. We also perform a breakdown of the dynamics of counting mementos for a particular URI-R (google.com) and quantify the prevalence of the various canonicalization patterns that exacerbate attempts at counting using only a TimeMap. For google.com we found that 84.9% of the URI-Ms result in an HTTP redirect when dereferenced. We expand on and apply this metric to TimeMaps for seven other URI-Rs of large Web sites and thirteen academic institutions. Using a ratio metric DI for the number of URI-Ms without redirects to those requiring a redirect when dereferenced, five of the eight large web sites' and two of the thirteen academic institutions' TimeMaps had a ratio of ratio less than one, indicating that more than half of the URI-Ms in these TimeMaps result in redirects when dereferenced.Comment: 43 pages, 8 figure

    WARCreate and WAIL: WARC, Wayback, and Heritrix Made Easy

    Get PDF
    [First slide] The Problem Institutional Tools, Personal Archivists ON YOUR MACHINE -Complex to Operate -Require Infrastructure DELEGATED TO INSTITUTIONS -$ -Lose original perspective Locale content tailoring (DC vs. San Francisco) Observation Medium (PC web browser vs. Crawler

    WARCreate - Create Wayback-Consumable WARC Files From Any Webpage

    Get PDF
    [First Slide] What is WARCreate? Google Chrome extension Creates WARC files Enables preservation by users from their browser First steps in bringing Institutional Archiving facilities to the P

    Client-Assisted Memento Aggregation Using The Prefer Header

    Get PDF
    [First paragraph] Preservation of the Web ensures that future generations have a picture of how the web was. Web archives like Internet Archive\u27s Wayback Machine, WebCite, and archive.is allow individuals to submit URIs to be archived, but the captures they preserve then reside at the archives. Traversing these captures in time as preserved by multiple archive sources (using Memento [8]) provides a more comprehensive picture of the past Web than relying on a single archive. Some content on the Web, such as content behind authentication, may be unsuitable or inaccessible for preservation by these organizations. Furthermore, this content may be inappropriate for the organizations to preserve due to reasons of privacy or exposure of personally identifiable information [4]. However, preserving this content would ensure an even-more comprehensive picture of the web and may be useful for future historians who wish to analyze content beyond the capability or suitability of archives created to preserve the public Web
    • …
    corecore